This is the data behind the EDITS work. It follows four steps. For Step 1 I look at uptake rates between 2010 and 2021 in three countries: US, UK and Germany. For the US, I use the PSID which tracks households between 2011- 2021 on a biannual basis. For Germany, I use SOEP which tracks households between 2010- 2021 on an annual basis, and for the UK I use Understanding Society which tracks households between 2010 - 2021 on an annual basis. All data was downloaded in September 2024 and cleaning files are on github.
8 technologies selected are :Internet Access, own a smartphone, Home has Solar, Home switched from Gas/Coal/Wood to electric (fuel switching), Home renovation/ retrofitting, EV/ Hybrid car, Commuting habit (walking, biking), digital skills/ capabilities (daily use of internet).
Note: Significant limitations on technologies surveyed. With better data, I could also do this for smart-meters, air conditioners, etc.
This provides a snapshot for current uptake across the households assessing heterogeneity across income level.
For the US, I have information for solar ownership, EV/ hybrid ownership, smartphone ownership, internet access, and commute habit. I create an additional variable on fuel switching by looking at heating method and seeing where moved from wood, oil, gas, coal to electric heating. I also create a variable for retrofitting based on home renovations and skills based on survey responses where RP responds they use the internet daily.
Note: Smartphone ownership and digital skills questions were only added to the survey in 2015.
`summarise()` has grouped output by 'year'. You can override using the `.groups` argument.`summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
Clean heating is interesting in that high income households had less electric only and marginally increased over 2011 - 2021 time period, though they still did increase within decile 9 and 10. Solar panel infiltration is lower than national average at 1% with EIA estimating 4%.
For the UK, I have solar, EV/ hybrid, smartphone, internet, commute and skills. I create fuel switching in the same manner as for US, buy looking at home energy bills and looking for those households that switch from gas/coal to electric only.
Solar question is only asked in 2009, 2012, 2018, and 2021. EV is asked in 2012, 2015, 2018, and 2021. Smartphone begins in 2013. Skills starts in 2011.
`summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
For Germany I have information on solar, EV, retrofitting, smartphone, internet.
EV only exists in 2015 and 2020. (note it is full EV or biodiesel, no hybrid category) Retrofitting is asked from 2010 - 2015, and 2019 Smartphone
Error in select(., -internet_new) : unused argument (-internet_new)
Not all technologies in each country have the same year of start and end data. I now get data as close to 2015 start and 2021 end as possible. Look at year ranges available for uptake rates for each country and technology at decile level.
There are two outlier technologies for uptake rates. The first is Sustainable Transit in the UK. UK commute (sustainable transport) only shows commuting patterns for those who work, so potentially hides significant commuting preferences, particularly for those out of work or job seeking and is being compared against COVID so less people were commuting to work in 2021, no matter their method. The second is home renovations in Germany. If i cange the year to later, the figure does go up, but 2010 shows a renovation boom that is lost in later years.
For fun, I’m also looking at US state variation, but this is for PhD, please ignore in meeting.
In the baseline model I assume no targeted policy to change distribution (i.e. 3% for 10th decile vs 15% for 1st in 2020 with a constant growth projection to 2050). Model aims to reach exogenously government provided diffusion target for each technology (e.g.40% for housing retrofit vs 60% for EVs).
# Load required packages
library(tidyverse)
library(nls2)
# Function to fit logistic growth model for a specific decile
forecast_logistic_growth <- function(data, tech_col, decile_val, forecast_year = 2050) {
# Create model data for specific decile
model_data <- tibble(
year = data$year[data$decile == decile_val],
adoption = data[[tech_col]][data$decile == decile_val]
) %>%
arrange(year) %>%
filter(!is.na(adoption))
# Normalize years to start from 0 for better model fitting
model_data$t <- model_data$year - min(model_data$year)
# Get initial parameter estimates
K_guess <- max(100, max(model_data$adoption) * 1.2) # Carrying capacity
r_guess <- 0.5 # Growth rate
N0_guess <- model_data$adoption[1] # Initial adoption
tryCatch({
# Fit logistic growth model using nls
# Formula: N(t) = K / (1 + ((K - N0)/N0) * exp(-r*t))
model <- nls(adoption ~ K / (1 + ((K - N0)/N0) * exp(-r * t)),
data = model_data,
start = list(K = K_guess, r = r_guess, N0 = N0_guess),
control = nls.control(maxiter = 1000))
# Generate future years for prediction
future_years <- seq(min(model_data$year), forecast_year, by = 1)
future_t <- future_years - min(model_data$year)
# Extract parameters
params <- coef(model)
K <- params["K"]
r <- params["r"]
N0 <- params["N0"]
# Generate predictions
predictions <- tibble(
year = future_years,
t = future_t,
predicted_adoption = K / (1 + ((K - N0)/N0) * exp(-r * t)),
decile = decile_val
)
list(
predictions = predictions,
model = model,
parameters = params,
data = model_data,
K = K,
r = r,
N0 = N0
)
}, error = function(e) {
cat("Error fitting model for decile", decile_val, ":", e$message, "\n")
NULL
})
}
# Function to plot growth curves by decile
plot_logistic_growth <- function(forecasts_list, tech_col) {
# Combine all predictions
all_predictions <- bind_rows(
lapply(forecasts_list, function(x) x$predictions)
)
# Combine all historical data
all_data <- bind_rows(
lapply(forecasts_list, function(x) {
x$data %>% mutate(decile = x$predictions$decile[1])
})
)
# Create plot
ggplot() +
# Historical data points
geom_point(data = all_data,
aes(x = year, y = adoption, color = factor(decile)),
alpha = 0.6, size = 2) +
# Forecast lines
geom_line(data = all_predictions,
aes(x = year, y = predicted_adoption, color = factor(decile))) +
scale_color_viridis_d(name = "Income Decile") +
theme_minimal() +
labs(
title = paste("Logistic Growth Forecast by Income Decile:",
gsub("_percent", "", tech_col)),
subtitle = "Historical data and logistic growth projections to 2050",
x = "Year",
y = "Adoption Rate (%)"
) +
scale_x_continuous(breaks = seq(2010, 2050, by = 5)) +
ylim(0, 100) +
theme(
panel.grid.minor = element_blank(),
legend.position = "right"
)
}
# Define technologies
technologies <- c("smartphone_percent", "hybrid_percent", "solar_percent",
"internet_percent", "repairs_percent", "skills_percent",
"fuel_switch_percent")
# Create nested list for results
forecasts_by_decile <- list()
# Fit models for each technology and decile
for(tech in technologies) {
cat("\nProcessing", tech, "\n")
# Store forecasts for each decile
decile_forecasts <- list()
for(d in 1:10) {
cat(" Decile", d, "\n")
decile_forecasts[[d]] <- forecast_logistic_growth(us_props, tech, d)
}
# Store and plot if we have successful fits
if(any(!sapply(decile_forecasts, is.null))) {
forecasts_by_decile[[tech]] <- decile_forecasts
# Create and print plot
plot <- plot_logistic_growth(decile_forecasts[!sapply(decile_forecasts, is.null)], tech)
print(plot)
# Print model parameters and predictions
cat("\nLogistic Growth Parameters for", gsub("_percent", "", tech), ":\n")
parameter_summary <- bind_rows(lapply(decile_forecasts[!sapply(decile_forecasts, is.null)], function(x) {
tibble(
decile = x$predictions$decile[1],
carrying_capacity = round(x$K, 1),
growth_rate = round(x$r, 3),
initial_adoption = round(x$N0, 1),
adoption_2030 = round(x$predictions$predicted_adoption[x$predictions$year == 2030], 1),
adoption_2040 = round(x$predictions$predicted_adoption[x$predictions$year == 2040], 1),
adoption_2050 = round(x$predictions$predicted_adoption[x$predictions$year == 2050], 1)
)
}))
print(parameter_summary %>% arrange(decile))
}
}
# Create historical price dataset (EIA Average Retail Electricity and Gas 2005 - 2022)
historical_prices <- tibble(
year = 2005:2022,
elec_price = c(9.45, 10.40, 10.65, 11.26, 11.51, 11.54, 11.72, 11.88, 12.13,
12.52, 12.65, 12.55, 12.89, 12.87, 13.01, 13.15, 13.72, 14.77), # cents per kWh
gas_price = c(13.83, 13.73, 13.08, 13.89, 12.14, 11.39, 11.03, 10.71, 10.32,
10.97, 10.38, 10.05, 10.91, 10.50, 10.44, 10.78, 11.47, 15.95) # dollars per thousand cubic feet
)
# Merge with US dataset
US <- US %>%
left_join(historical_prices, by = "year")
# Print summary of the new columns
cat("Summary of added energy price columns:\n")
summary(US[c("elec_price", "gas_price")])
# Create quick visualization of price trends
ggplot(historical_prices, aes(x = year)) +
geom_line(aes(y = elec_price, color = "Electricity"), size = 1) +
geom_line(aes(y = gas_price, color = "Natural Gas"), size = 1) +
theme_minimal() +
labs(
title = "Historical Energy Prices (2005-2022)",
y = "Price",
color = "Energy Type"
) +
scale_color_manual(values = c("Electricity" = "blue", "Natural Gas" = "red")) +
theme(legend.position = "bottom") +
annotate("text", x = max(historical_prices$year), y = max(historical_prices$elec_price),
label = "cents per kWh", hjust = 1, vjust = -0.5) +
annotate("text", x = max(historical_prices$year), y = max(historical_prices$gas_price),
label = "dollars per thousand cubic feet", hjust = 1, vjust = -0.5)
# Calculate energy consumption
US <- US %>%
mutate(
# Convert electricity expenditure to kWh
# elec_price is in cents per kWh, so multiply by 100 to convert to dollars
elec_consumption_kwh = (elec_a_exp / (elec_price/100)),
# Convert gas expenditure to thousand cubic feet
gas_consumption_tcf = gas_a_exp / gas_price,
# Calculate consumption per room (to account for house size)
elec_consumption_per_room = elec_consumption_kwh / hh_rooms,
gas_consumption_per_room = gas_consumption_tcf / hh_rooms
)
# Create summary statistics
consumption_summary <- US %>%
group_by(year) %>%
summarise(
avg_elec_kwh = mean(elec_consumption_kwh, na.rm = TRUE),
median_elec_kwh = median(elec_consumption_kwh, na.rm = TRUE),
avg_gas_tcf = mean(gas_consumption_tcf, na.rm = TRUE),
median_gas_tcf = median(gas_consumption_tcf, na.rm = TRUE),
avg_elec_per_room = mean(elec_consumption_per_room, na.rm = TRUE),
avg_gas_per_room = mean(gas_consumption_per_room, na.rm = TRUE)
)
# Print summary statistics
print("Average Annual Energy Consumption:")
print(consumption_summary)
# Create visualization of consumption trends
ggplot(consumption_summary, aes(x = year)) +
geom_line(aes(y = avg_elec_kwh, color = "Electricity (kWh)"), size = 1) +
geom_line(aes(y = avg_gas_tcf * 1000, color = "Gas (cf)"), size = 1) + # Multiply by 1000 to convert to cubic feet
theme_minimal() +
labs(
title = "Average Household Energy Consumption Over Time",
y = "Consumption",
color = "Energy Type"
) +
scale_color_manual(values = c("Electricity (kWh)" = "blue", "Gas (cf)" = "red")) +
theme(legend.position = "bottom")
# Create boxplot of consumption by house size
ggplot(US, aes(x = factor(hh_rooms), y = elec_consumption_kwh)) +
geom_boxplot(alpha = 0.5) +
theme_minimal() +
labs(
title = "Electricity Consumption by Number of Rooms",
x = "Number of Rooms",
y = "Annual Electricity Consumption (kWh)"
) +
theme(legend.position = "none")
# Basic sanity checks (print ranges to check for unreasonable values)
cat("\nConsumption Ranges (removing outliers):\n")
consumption_ranges <- US %>%
summarise(
elec_kwh_min = quantile(elec_consumption_kwh, 0.05, na.rm = TRUE),
elec_kwh_max = quantile(elec_consumption_kwh, 0.95, na.rm = TRUE),
gas_tcf_min = quantile(gas_consumption_tcf, 0.05, na.rm = TRUE),
gas_tcf_max = quantile(gas_consumption_tcf, 0.95, na.rm = TRUE)
)
print(consumption_ranges)
# First, generate forecasts for each technology and decile
forecast_logistic_growth <- function(data, tech_col, decile_val, forecast_year = 2050) {
# Create model data for specific decile
model_data <- tibble(
year = data$year[data$decile == decile_val],
adoption = data[[tech_col]][data$decile == decile_val]
) %>%
arrange(year) %>%
filter(!is.na(adoption))
# Normalize years to start from 0 for better model fitting
model_data$t <- model_data$year - min(model_data$year)
tryCatch({
# Get initial parameter estimates
K_guess <- max(100, max(model_data$adoption) * 1.2) # Carrying capacity
r_guess <- 0.5 # Growth rate
N0_guess <- model_data$adoption[1] # Initial adoption
# Fit logistic growth model using nls
model <- nls(adoption ~ K / (1 + ((K - N0)/N0) * exp(-r * t)),
data = model_data,
start = list(K = K_guess, r = r_guess, N0 = N0_guess),
control = nls.control(maxiter = 1000))
# Generate future years for prediction
future_years <- seq(min(model_data$year), forecast_year, by = 1)
future_t <- future_years - min(model_data$year)
# Extract parameters
params <- coef(model)
K <- params["K"]
r <- params["r"]
N0 <- params["N0"]
# Generate predictions
predictions <- tibble(
year = future_years,
t = future_t,
predicted_adoption = K / (1 + ((K - N0)/N0) * exp(-r * t)),
decile = decile_val
)
return(list(
predictions = predictions,
model = model,
parameters = params,
data = model_data,
K = K,
r = r,
N0 = N0
))
}, error = function(e) {
cat("Error fitting model for decile", decile_val, ":", e$message, "\n")
return(NULL)
})
}
# Generate forecasts
technologies <- c("hybrid_prop", "solar_prop", "fuel_switch_prop")
forecasts_by_decile <- list()
for(tech in technologies) {
cat("\nGenerating forecasts for", tech, "\n")
tech_forecasts <- list()
for(d in 1:10) {
cat(" Processing decile", d, "\n")
tech_forecasts[[d]] <- forecast_logistic_growth(us_props, tech, d)
}
forecasts_by_decile[[tech]] <- tech_forecasts
}
# Now calculate energy impacts
# Get current adoption rates
max_year <- max(us_props$year)
current_rates <- data.frame(
decile = us_props$decile[us_props$year == max_year],
current_hybrid = us_props$hybrid_prop[us_props$year == max_year],
current_solar = us_props$solar_prop[us_props$year == max_year],
current_fuel_switch = us_props$fuel_switch_prop[us_props$year == max_year]
)
# Calculate baseline consumption
baseline_consumption <- US %>%
group_by(decile) %>%
summarise(
avg_energy_with_hybrid = mean(elec_consumption_kwh[hybrid == 1], na.rm = TRUE),
avg_energy_no_hybrid = mean(elec_consumption_kwh[hybrid == 0], na.rm = TRUE),
hybrid_impact = avg_energy_with_hybrid - avg_energy_no_hybrid,
avg_energy_with_solar = mean(elec_consumption_kwh[heat_method == 6], na.rm = TRUE),
avg_energy_no_solar = mean(elec_consumption_kwh[heat_method != 6], na.rm = TRUE),
solar_impact = avg_energy_with_solar - avg_energy_no_solar,
avg_energy_with_elec_heat = mean(elec_consumption_kwh[heat_method == 2], na.rm = TRUE),
avg_energy_no_elec_heat = mean(elec_consumption_kwh[heat_method != 2], na.rm = TRUE),
elec_heat_impact = avg_energy_with_elec_heat - avg_energy_no_elec_heat
)
# Function to calculate energy impact
calculate_energy_impact <- function(current_adoption, future_adoption, energy_impact, households = 1000) {
additional_adopters = (future_adoption - current_adoption) * households / 100
total_impact = additional_adopters * energy_impact
return(total_impact)
}
# Calculate impacts
tech_impacts <- list()
impact_counter <- 0
for(tech in technologies) {
cat("\nProcessing impacts for", tech, "\n")
for(d in 1:10) {
cat(" Processing decile", d, "\n")
if(!is.null(forecasts_by_decile[[tech]][[d]])) {
forecast <- forecasts_by_decile[[tech]][[d]]
current <- current_rates[current_rates$decile == d, ]
baseline <- baseline_consumption[baseline_consumption$decile == d, ]
impact_counter <- impact_counter + 1
tech_impacts[[impact_counter]] <- data.frame(
technology = gsub("_prop", "", tech),
decile = d,
additional_energy_2030 = calculate_energy_impact(
current[[paste0("current_", gsub("_prop", "", tech))]],
forecast$predictions$predicted_adoption[forecast$predictions$year == 2030],
switch(tech,
"hybrid_prop" = baseline$hybrid_impact,
"solar_prop" = baseline$solar_impact,
"fuel_switch_prop" = baseline$elec_heat_impact)
),
additional_energy_2040 = calculate_energy_impact(
current[[paste0("current_", gsub("_prop", "", tech))]],
forecast$predictions$predicted_adoption[forecast$predictions$year == 2040],
switch(tech,
"hybrid_prop" = baseline$hybrid_impact,
"solar_prop" = baseline$solar_impact,
"fuel_switch_prop" = baseline$elec_heat_impact)
),
additional_energy_2050 = calculate_energy_impact(
current[[paste0("current_", gsub("_prop", "", tech))]],
forecast$predictions$predicted_adoption[forecast$predictions$year == 2050],
switch(tech,
"hybrid_prop" = baseline$hybrid_impact,
"solar_prop" = baseline$solar_impact,
"fuel_switch_prop" = baseline$elec_heat_impact)
)
)
}
}
}
# Combine results if we have any
if(length(tech_impacts) > 0) {
tech_impacts_df <- do.call(rbind, tech_impacts)
# Create visualization
ggplot(tech_impacts_df %>%
pivot_longer(cols = starts_with("additional_energy"),
names_to = "year",
values_to = "energy_impact") %>%
mutate(year = as.numeric(gsub("additional_energy_", "", year))),
aes(x = factor(decile), y = energy_impact, fill = technology)) +
geom_col(position = "dodge") +
facet_wrap(~year) +
theme_minimal() +
labs(
title = "Projected Additional Energy Demand from Technology Adoption",
subtitle = "By income decile and technology type",
x = "Income Decile",
y = "Additional Annual Energy Consumption (kWh)",
fill = "Technology"
) +
theme(legend.position = "bottom")
# Print summary statistics
cat("\nSummary of Additional Energy Demand (kWh):\n")
print(tech_impacts_df %>%
group_by(technology) %>%
summarise(
total_2030 = sum(additional_energy_2030, na.rm = TRUE),
total_2040 = sum(additional_energy_2040, na.rm = TRUE),
total_2050 = sum(additional_energy_2050, na.rm = TRUE)
))
}